The Analytical Policeman

In this lecture, we will discuss how visualization can offer insights in the area of policing in urban environments.

The explosion of computerized data affects all parts of society, including policing.

In the past, human judgment and experience was the only tool in identifying patterns in criminal behavior.

Police forces around the United States and the world are augmenting human judgment with analytics - sometimes described as predictive analysis.

These days the LAPD is on an offensive to prevent crime. Its latest weapon is a computer program that can actually predict where crimes will happen.

In the Foothill Division north of downtown Los Angeles, police are patrolling the largely working-class neighborhoods with specially marked maps. The small red squares are hot spots, where computers project property crimes are most likely.

It’s called predictive policing.

These crime prediction boxes come from the same kind of mathematical calculation used to predict earthquakes and aftershocks. By analyzing the times, dates, and places of recent crimes computers project hot spots for burglaries, break-ins, and car thefts.

LA’s Police Chief Charlie Beck says increasing police patrols inside those boxes denies criminals opportunity.The real measure of this is not how many people you catch, it’s how much crime you prevent.

The LAPD began testing the predictive policing model here in the Foothill Division in November, and the early results are encouraging. Burglaries are down 33%, and violent crime is also down 21%. T hat success will allow Beck to expand the program to other parts of the city and leverage limited resources.

The Los Angeles Police Chief Charlie Beck writes,

“I’m not going to get more money. I’m not going to get more cops. I have to be better at using what I have, and that’s what predictive policing is about. If this old street cop can change the way that he thinks about this stuff, then I know that my officers can do the same.”

Let me comment on the role of analytics.

The analytical tools you have learned in this class can be used to make these predictive policing models possible.

However, communicating the results of these models is critical. A linear regression output table will not be of use to a police person on patrol.

Visualization bridges the gap between the analytics and the end user.

Visualizing Crime Over Time

In almost every application, before we even consider a predictive model, we should try to understand the historical data.

Many cities in the United States and around the world, provide logs of reported crimes, usually the time, location, and nature of the event.

In this lecture, we’ll use data from the city of Chicago, in the United States, about motor vehicle thefts.

Given this data on crimes, suppose we wanted to communicate crime patterns over the course of an average week.

We could display daily crime averages using a line graph, like the one shown here, but this doesn’t seem too useful.

We can see that crime tends to be higher on Saturday, but when on Saturday, and where?

We could replace our x-axis with the hour of the day and have a different line for every day of the week to understand when crime occurs in more detail.

But this would be a jumbled mess with seven lines, and probably very hard to read.

We could instead use no visualization at all, and instead present information in a table.

For each hour and day, we have the total number of crimes that occurred. This is a valid representation of the data, but large tables of numbers can be hard to read and understand.

So how can we make the table more interesting and usable?

A great way to visualize information in a two-dimensional table is with a heatmap.

Heatmaps visualize data using three attributes. Two of the attributes are on the x and y-axes, typically displayed horizontally and vertically.

The third attribute is represented by shades of color.

In this example, lower values in the third attribute correspond to colors closer to blue, and higher values in the third attribute correspond to colors closer to red.

For example, the x-axis could be hours of the day, the y-axis could be days of the week, and the colors could correspond to the amount of crime.

In a heat map, we can pick different color schemes based on the type of data to convey different messages.

In crime, a yellow to red color scheme might be appropriate because it can highlight some of the more dangerous areas in red.

Your eye is naturally drawn to the red areas of the plot.

In other applications, both high and low values are meaningful, so having a more varied color scheme might be useful.

And in other applications, you might only want to see cells with high values, so you could use a gray scale to make the cells with low values white.

The x and y-axes in a heat map don’t need to be continuous.

In our example, we have a categorical or factor variable, the day of the week.

And we can even combine a heat map with a geographical map, which we’ll discuss later in this lecture.

This type of heat map is frequently used in predictive policing to show crime hot spots in a city.

In this lecture, we’ll use Chicago motor vehicle theft data to explore patterns of crime, both over days of the week, and over hours of the day.

We’re interested in analyzing the total number of car thefts that occur in any particular hour of a day of the week over our whole data set.

A Line Plot

Let’s start by reading in our data.

mvt = read.csv("mvt.csv", stringsAsFactors = FALSE) # since we have a text field
str(mvt)
## 'data.frame':    191641 obs. of  3 variables:
##  $ Date     : chr  "12/31/12 23:15" "12/31/12 22:00" "12/31/12 22:00" "12/31/12 22:00" ...
##  $ Latitude : num  41.8 41.9 42 41.8 41.8 ...
##  $ Longitude: num  -87.6 -87.7 -87.8 -87.7 -87.6 ...

We have over 190,000 observations of three different variables:

  • the date of the crime
  • the location of the crime, in terms of latitude and longitude.

We want to first convert the Date variable to a format that R will recognize so that we can extract the day of the week and the hour of the day.

We can do this using the strptime function. So we want to replace our variable, Date, with the output of the strptime function, which takes as a first argument our variable, Date, and then as a second argument the format that the date is in.

mvt$Date = strptime(mvt$Date, format="%m/%d/%y %H:%M")

In this format, we can extract the hour and the day of the week from the Date variable, and we can add these as new variables to our data frame. We can do this by first defining our new variable, Then, we just take the hour variable out of Date variable.

Sys.setlocale("LC_ALL", "C")
## [1] "LC_CTYPE=C;LC_NUMERIC=C;LC_TIME=C;LC_COLLATE=C;LC_MONETARY=C;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF-8;LC_IDENTIFICATION=C"
mvt$Weekday = weekdays(mvt$Date)
mvt$Hour = mvt$Date$hour
str(mvt)
## 'data.frame':    191641 obs. of  5 variables:
##  $ Date     : POSIXlt, format: "2012-12-31 23:15:00" "2012-12-31 22:00:00" ...
##  $ Latitude : num  41.8 41.9 42 41.8 41.8 ...
##  $ Longitude: num  -87.6 -87.7 -87.8 -87.7 -87.6 ...
##  $ Weekday  : chr  "Monday" "Monday" "Monday" "Monday" ...
##  $ Hour     : int  23 22 22 22 21 20 20 20 19 18 ...

Now, we have two more variables:

  • Weekday, which gives the day of the week
  • Hour, which gives the hour of the day.

Now, we’re ready to make some line plots. Let’s start by creating the line plot with just one line and a value for every day of the week.

We want to plot as that value the total number of crimes on each day of the week. We can get this information by creating a table of the Weekday variable.

table(mvt$Weekday)
## 
##    Friday    Monday  Saturday    Sunday  Thursday   Tuesday Wednesday 
##     29284     27397     27118     26316     27319     26791     27416

This gives the total amount of crime on each day of the week.

Let’s save this table as a data frame so that we can pass it to ggplot as our data.

WeekdayCounts = as.data.frame(table(mvt$Weekday))
str(WeekdayCounts)
## 'data.frame':    7 obs. of  2 variables:
##  $ Var1: Factor w/ 7 levels "Friday","Monday",..: 1 2 3 4 5 6 7
##  $ Freq: int  29284 27397 27118 26316 27319 26791 27416

We can see that our data frame has seven observations, one for each day of the week, and two different variables. The first variable, called Var1, gives the name of the day of the week, and the second variable, called Freq, for frequency, gives the total amount of crime on that day of the week.

Now, we’re ready to make our plot. First, we need to load the ggplot2 package.

library(ggplot2)

Now, we’ll create our plot using the ggplot function.

ggplot(WeekdayCounts, aes(x = Var1, y = Freq)) + geom_line(aes(group=1)) # this groups all of data into one line

We can see that this is very close to the plot we want. We have the total number of crime plotted by day of the week, but our days of the week are a little bit out of order.

We have Friday first, then Monday, then Saturday, then Sunday, etc. What ggplot did was it put the days of the week in alphabetical order.

But we actually want the days of the week in chronological order to make this plot a bit easier to read. We can do this by making the Var1 variable an ordered factor variable. This signals to ggplot that the ordering is meaningful.

We can do this by using the factor function where the first argument is our variable, WeekdayCounts$Var1, the second argument is ordered = TRUE. And the third argument, which is levels,should be equal to a vector of the days

WeekdayCounts$Var1 = factor(WeekdayCounts$Var1, ordered = TRUE, levels=c("Sunday", "Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday"))
ggplot(WeekdayCounts, aes(x = Var1, y = Freq)) + geom_line(aes(group=1))

Now, this is the plot we want. We have the total crime by day of the week with the days of the week in chronological order.

The last thing we’ll want to do to our plot is just change the x- and y-axis labels, since they’re not very helpful as they are now.

ggplot(WeekdayCounts, aes(x = Var1, y = Freq)) + geom_line(aes(group=1)) + xlab("Day of the Week") + ylab("Total Motor Vehicle Theft")

A Heatmap

We’ll add the hour of the day to our line plot, and then create an alternative visualization using a heat map.

We can do this by creating a line for each day of the week and making the x-axis the hour of the day.

We first need to create a counts table for the weekday, and hour.

table(mvt$Weekday, mvt$Hour)
##            
##                0    1    2    3    4    5    6    7    8    9   10   11
##   Friday    1873  932  743  560  473  602  839 1203 1268 1286  938  822
##   Monday    1900  825  712  527  415  542  772 1123 1323 1235  971  737
##   Saturday  2050 1267  985  836  652  508  541  650  858 1039  946  789
##   Sunday    2028 1236 1019  838  607  461  478  483  615  864  884  787
##   Thursday  1856  816  696  508  400  534  799 1135 1298 1301  932  731
##   Tuesday   1691  777  603  464  414  520  845 1118 1175 1174  948  786
##   Wednesday 1814  790  619  469  396  561  862 1140 1329 1237  947  763
##            
##               12   13   14   15   16   17   18   19   20   21   22   23
##   Friday    1207  857  937 1140 1165 1318 1623 1652 1736 1881 2308 1921
##   Monday    1129  824  958 1059 1136 1252 1518 1503 1622 1815 2009 1490
##   Saturday  1204  767  963 1086 1055 1084 1348 1390 1570 1702 2078 1750
##   Sunday    1192  789  959 1037 1083 1160 1389 1342 1706 1696 2079 1584
##   Thursday  1093  752  831 1044 1131 1258 1510 1537 1668 1776 2134 1579
##   Tuesday   1108  762  908 1071 1090 1274 1553 1496 1696 1816 2044 1458
##   Wednesday 1225  804  863 1075 1076 1289 1580 1507 1718 1748 2093 1511

This table gives, for each day of the week and each hour, the total number of motor vehicle thefts that occurred.

Let’s save this table to a data frame so that we can use it in our visualizations. We’ll call it DayHourCounts and use the as.data.frame function, run on our table, where the first variable is the Weekday and the second variable is the Hour.

DayHourCounts = as.data.frame(table(mvt$Weekday, mvt$Hour))
str(DayHourCounts)
## 'data.frame':    168 obs. of  3 variables:
##  $ Var1: Factor w/ 7 levels "Friday","Monday",..: 1 2 3 4 5 6 7 1 2 3 ...
##  $ Var2: Factor w/ 24 levels "0","1","2","3",..: 1 1 1 1 1 1 1 2 2 2 ...
##  $ Freq: int  1873 1900 2050 2028 1856 1691 1814 932 825 1267 ...

We can see that we have 168 observations, one for each day of the week and hour pair, and three different variables:

  • Var1 gives the day of the week.
  • Var2 gives the hour of the day.
  • Freq for frequency gives the total crime count.

Let’s convert the second variable, Var2, to actual numbers and call it Hour, since this is the hour of the day, and it makes sense that it’s numerical.

DayHourCounts$Hour = as.numeric(as.character(DayHourCounts$Var2))
str(DayHourCounts)
## 'data.frame':    168 obs. of  4 variables:
##  $ Var1: Factor w/ 7 levels "Friday","Monday",..: 1 2 3 4 5 6 7 1 2 3 ...
##  $ Var2: Factor w/ 24 levels "0","1","2","3",..: 1 1 1 1 1 1 1 2 2 2 ...
##  $ Freq: int  1873 1900 2050 2028 1856 1691 1814 932 825 1267 ...
##  $ Hour: num  0 0 0 0 0 0 0 1 1 1 ...

This is how we convert a factor variable to a numeric variable.

Now we’re ready to create our plot. We just need to change the group to Var1, which is the day of the week.

ggplot(DayHourCounts, aes(x = Hour, y = Freq)) + geom_line(aes(group = Var1))

You should see a new plot show up in the graphics window. It has seven lines, one for each day of the week. While this is interesting, we can’t tell which line is which day, so let’s change the colors of the lines to correspond to the days of the week.

ggplot(DayHourCounts, aes(x = Hour, y = Freq)) + geom_line(aes(group = Var1, color = Var1), size = 2)

Now in our plot, each line is colored corresponding to the day of the week. This helps us see that on Saturday and Sunday, for example, the green and the teal lines, there’s less motor vehicle thefts in the morning.

While we can get some information from this plot, it’s still quite hard to interpret. Seven lines is a lot. Let’s instead visualize the same information with a heat map.

To make a heat map, we’ll use our data in our data frame DayHourCounts. First, though, we need to fix the order of the days so that they’ll show up in chronological order instead of in alphabetical order.

So for DayHourCounts$Var1, which is the day of the week, we’re going to use the factor function.

DayHourCounts$Var1 = factor(DayHourCounts$Var1, ordered = TRUE, levels = c("Sunday", "Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday"))

Now let’s make our heat map.

geom_tile() is the function we use to make a heat map. And then in the aesthetic for our tiles, we want the fill to be equal to Freq. This will define the colors of the rectangles in our heat map to correspond to the total crime.

ggplot(DayHourCounts, aes(x = Hour, y = Var1)) + geom_tile(aes(fill = Freq))

You should see a heat map pop up in your graphics window.

So how do we read this? For each hour and each day of the week, we have a rectangle in our heat map. The color of that rectangle indicates the frequency,or the number of crimes that occur in that hour and on that day. Our legend tells us that lighter colors correspond to more crime. So we can see that a lot of crime happens around midnight, particularly on the weekends.

We can change the label on the legend, and get rid of the y label to make our plot a little nicer.

ggplot(DayHourCounts, aes(x = Hour, y = Var1)) + geom_tile(aes(fill = Freq)) + scale_fill_gradient(name="Total MV Thefts") + theme(axis.title.y = element_blank())

And now on our heat map, the legend is titled “Total MV Thefts” and the y-axis label is gone.

We can also change the color scheme. We’ll make lower values correspond to white colors and higher values correspond to red colors.

ggplot(DayHourCounts, aes(x = Hour, y = Var1)) + geom_tile(aes(fill = Freq)) + scale_fill_gradient(name="Total MV Thefts", low="white", high="red") + theme(axis.title.y = element_blank())

This is a common color scheme in policing. It shows the hot spots, or the places with more crime, in red. So now the most crime is shown by the red spots and the least crime is shown by the lighter areas.

It looks like Friday night is a pretty common time for motor vehicle thefts.

We saw something that we didn’t really see in the heat map before.

It’s often useful to change the color scheme depending on whether you want high values or low values to pop out, and the feeling you want the plot to portray.

A Geographical Hot Spot Map

First, we need to install and load two new packages, the maps package and the ggmap package.

library(maps)
## 
##  # maps v3.1: updated 'world': all lakes moved to separate new #
##  # 'lakes' database. Type '?world' or 'news(package="maps")'.  #
library(ggmap)

Now, let’s load a map of Chicago into R. We can easily do this by using the get_map function.

chicago = get_map(location = "chicago", zoom = 11)
## Map from URL : http://maps.googleapis.com/maps/api/staticmap?center=chicago&zoom=11&size=640x640&scale=2&maptype=terrain&language=en-EN&sensor=false
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?address=chicago&sensor=false
ggmap(chicago)

In your R graphics window, you should see a geographical map of the city of Chicago. Now let’s plot the first 100 motor vehicle thefts in our data set on this map.

ggmap(chicago) + geom_point(data=mvt[1:100,], aes(x = Longitude, y = Latitude))
## Warning: Removed 7 rows containing missing values (geom_point).

You should see the map of Chicago with black points marking where the first 100 motor vehicle thefts were. If we plotted all 190,000 motor vehicle thefts, we would just see a big black box, which wouldn’t be helpful at all.

We’re more interested in whether or not an area has a high amount of crime, so let’s round our latitude and longitude to two digits of accuracy and create a crime counts data frame for each area.

We’ll call it LatLonCounts, and use the as.data.frame function run on the table that compares the latitude and longitude rounded to two digits of accuracy.

LatLonCounts = as.data.frame(table(round(mvt$Longitude, 2), round(mvt$Latitude, 2)))
str(LatLonCounts)
## 'data.frame':    1638 obs. of  3 variables:
##  $ Var1: Factor w/ 42 levels "-87.93","-87.92",..: 1 2 3 4 5 6 7 8 9 10 ...
##  $ Var2: Factor w/ 39 levels "41.64","41.65",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ Freq: int  0 0 0 0 0 0 0 0 0 0 ...

This gives us the total crimes at every point on a grid. We have 1,638 observations and three variables. The first two variables, Var1 and Var2, are the latitude and longitude coordinates, and the third variable is the number of motor vehicle thefts that occur in that area.

Let’s convert our longitude and latitude variables to numbers and call them Lat and Long.

LatLonCounts$Long = as.numeric(as.character(LatLonCounts$Var1))
LatLonCounts$Lat = as.numeric(as.character(LatLonCounts$Var2))
# this is how we convert a factor variable to a numerical variable

Now, let’s plot these points on our map, making the size and color of the points depend on the total number of motor vehicle thefts.

ggmap(chicago) + geom_point(data = LatLonCounts, aes(x = Long, y = Lat, color = Freq, size = Freq))
## Warning: Removed 615 rows containing missing values (geom_point).

Now, in the plot should have a point for every area defined by our latitude and longitude areas, and the points have a size and color corresponding to the number of crimes in that area.

So we can see that the lighter and larger points correspond to more motor vehicle thefts. This helps us see where in Chicago more crimes occur.

If we want to change the color scheme, we can do that too by adding scale color gradient(low=“yellow”, high=“red”).

ggmap(chicago) + geom_point(data = LatLonCounts, aes(x = Long, y = Lat, color = Freq, size = Freq)) + scale_color_gradient(low="yellow", high="red")
## Warning: Removed 615 rows containing missing values (geom_point).

You should see the same plot as before, but this time, the areas with more crime are closer to red and the areas with less crime are closer to yellow.

We can also use geom_tile to make something that looks more like a traditional heat map.

ggmap(chicago) + geom_tile(data=LatLonCounts, aes(x = Long, y = Lat, alpha = Freq), fill = "red")
## Warning: Removed 615 rows containing missing values (geom_tile).

In each area of Chicago, now that area is colored in red by the amount of crime there. This looks more like a map that people use for predictive policing.

A Heatmap on the United States

We’ll create a heat map on a map of the United States. We’ll be using the data set murders.csv, which is data provided by the FBI giving the total number of murders in the United States by state.

Let’s start by reading in our data set. We’ll call it murders, and we’ll use the read.csv function to read in the data file murders.csv.

murders = read.csv("murders.csv")
str(murders)
## 'data.frame':    51 obs. of  6 variables:
##  $ State            : Factor w/ 51 levels "Alabama","Alaska",..: 1 2 3 4 5 6 7 8 9 10 ...
##  $ Population       : int  4779736 710231 6392017 2915918 37253956 5029196 3574097 897934 601723 19687653 ...
##  $ PopulationDensity: num  94.65 1.26 57.05 56.43 244.2 ...
##  $ Murders          : int  199 31 352 130 1811 117 131 48 131 987 ...
##  $ GunMurders       : int  135 19 232 93 1257 65 97 38 99 669 ...
##  $ GunOwnership     : num  0.517 0.578 0.311 0.553 0.213 0.347 0.167 0.255 0.036 0.245 ...

We have 51 observations for the 50 states plus Washington, DC, and six different variables: the name of the state, the population, the population density, the number of murders, the number of murders that used guns, and the rate of gun ownership.

A map of the United States is included in R. Let’s load the map and call it statesMap.

statesMap = map_data("state")
str(statesMap)
## 'data.frame':    15537 obs. of  6 variables:
##  $ long     : num  -87.5 -87.5 -87.5 -87.5 -87.6 ...
##  $ lat      : num  30.4 30.4 30.4 30.3 30.3 ...
##  $ group    : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ order    : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ region   : chr  "alabama" "alabama" "alabama" "alabama" ...
##  $ subregion: chr  NA NA NA NA ...

This is just a data frame summarizing how to draw the United States. To plot the map, we’ll use the polygons geometry of ggplot.

ggplot(statesMap, aes(x=long, y=lat, group=group)) + geom_polygon(fill="white", color="black")

You should see a map of the United States. Before we can plot our data on this map, we need to make sure that the state names are the same in the murders data frame and in the statesMap data frame.

In the murders data frame, our state names are in the State variable, and they start with a capital letter. But in the statesMap data frame, our state names are in the region variable, and they’re all lowercase.

So let’s create a new variable called region in our murders data frame to match the state name variable in the statesMap data frame.

murders$region = tolower(murders$State)

Now we can join the statesMap data frame with the murders data frame by using the merge function, which matches rows of a data frame based on a shared identifier.

murderMap = merge(statesMap, murders, by="region")
str(murderMap)
## 'data.frame':    15537 obs. of  12 variables:
##  $ region           : chr  "alabama" "alabama" "alabama" "alabama" ...
##  $ long             : num  -87.5 -87.5 -87.5 -87.5 -87.6 ...
##  $ lat              : num  30.4 30.4 30.4 30.3 30.3 ...
##  $ group            : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ order            : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ subregion        : chr  NA NA NA NA ...
##  $ State            : Factor w/ 51 levels "Alabama","Alaska",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ Population       : int  4779736 4779736 4779736 4779736 4779736 4779736 4779736 4779736 4779736 4779736 ...
##  $ PopulationDensity: num  94.7 94.7 94.7 94.7 94.7 ...
##  $ Murders          : int  199 199 199 199 199 199 199 199 199 199 ...
##  $ GunMurders       : int  135 135 135 135 135 135 135 135 135 135 ...
##  $ GunOwnership     : num  0.517 0.517 0.517 0.517 0.517 0.517 0.517 0.517 0.517 0.517 ...

We have the same number of observations here that we had in the statesMap data frame, but now we have both the variables from the statesMap data frame and the variables from the murders data frame, which were matched up based on the region variable.

So now, let’s plot the number of murders on our map of the United States.

ggplot(murderMap, aes(x=long, y=lat, group=group, fill=Murders)) + geom_polygon(color="black") + scale_fill_gradient(low="black", high="red", guide = "legend")

You should see that each of the states is colored by the number of murders in that state. States with a larger number of murders are more red. So it looks like California and Texas have the largest number of murders. But is that just because they’re the most populous states?

Let’s create a map of the population of each state to check.

ggplot(murderMap, aes(x=long, y=lat, group=group, fill=Population)) + geom_polygon(color="black") + scale_fill_gradient(low="black", high="red", guide = "legend")

If you look at the graphics window, we have a population map here which looks exactly the same as our murders map. So we need to plot the murder rate instead of the number of murders to make sure we’re not just plotting a population map.

So in our R Console, let’s create a new variable for the murder rate and redo our plot with the fill equal to MurderRate.

murderMap$MurderRate = murderMap$Murders/murderMap$Population*100000 # murders per 100,000 population
ggplot(murderMap, aes(x=long, y=lat, group=group, fill=MurderRate)) + geom_polygon(color="black") + scale_fill_gradient(low="black", high="red", guide = "legend")

If you look at your graphics window now you should see that the plot is surprisingly maroon-looking. There aren’t really any red states. Why?

It turns out that Washington, DC is an outlier with a very high murder rate, but it’s such a small region on the map that we can’t even see it.

So let’s redo our plot, removing any observations with murder rates above 10, which we know will only exclude Washington, DC. Keep in mind that when interpreting and explaining the resulting plot, you should always note what you did to create it: removed Washington, DC from the data.

ggplot(murderMap, aes(x=long, y=lat, group=group, fill=MurderRate)) + geom_polygon(color="black") + scale_fill_gradient(low="black", high="red", guide = "legend", limits=c(0,10))

Now if you look back at your graphics window, you can see a range of colors on the map.

We have seen how we can make a heat map on a map of the United States, which is very useful for organizations like the World Health Organization or government entities who want to show data to the public organized by state or country.

The Analytics Edge

Let me comment on the merits of heatmaps as a way of representing data in the context of representing crime activity.

Criminal activity-related data often has both components of time and location. Sometimes, all that is required is a line chart, but heatmaps can visualize data that will be too big for a table.

Plotting data on maps is much more effective than a table for location based data, and is eye catching.

What is the edge of predictive policing?

Many police forces are exploiting their databases to focus finite resources on problem areas. Not only do analytics help improve policework, the outputs are also good communication tools to decision makers in government, and the wider public.